🎯 Tensor Cores - miterion · Scour

Reducing the Computational Cost Scaling of Tensor Network Algorithms via Field-Programmable Gate Array Parallelism

arxiv.org·3d

🏎️TensorRT

ggml: backend-agnostic tensor parallelism by JohannesGaessler · Pull Request #19378

github.com·3d·

Discuss: r/LocalLLaMA

Local-First AI: How SLMs are Fixing the Latency Gap 💻✨

dev.to·4h·

Discuss: DEV

⚡Flash Attention

Main Content || Math ∩ Programming

jeremykun.com·7h

📉Model Quantization

Writing a ONNX Neural Network Inference Engine from Scratch in C to run image classification with MobileNetV2

flexw.github.io·11h·

Discuss: r/C_Programming

⚡ONNX Runtime

🥇Top AI Papers of the Week

nlp.elvissaravia.com·15h

⚡ONNX Runtime

Hello Edge: Keyword Spotting on Microcontrollers

paperium.net·2d·

Discuss: DEV

🧩Attention Kernels

Show HN: Model Training Memory Simulator

czheo.github.io·20h·

Discuss: Hacker News

📊Gradient Accumulation

Heterogeneous Processing: A Strategy for Augmenting Moore's Law (2006)

linuxjournal.com·16h·

Discuss: Hacker News

⚡CUDA Programming Patterns

Fast Autoscheduling for Sparse ML Frameworks

ajroot.pl·4d·

Discuss: Hacker News, r/Compilers

⚡ONNX Runtime

FloorplanVLM: A Vision-Language Model for Floorplan Vectorization

arxiv.org·1h

How Anam Achieved 250% Faster Inference Using Zymtrace Continuous GPU Profiling

zymtrace.com·4h

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

news.ycombinator.com·1d·

Discuss: Hacker News

Quantization-Aware Distillation

ternarysearch.blogspot.com·1d·

Discuss: Hacker News

📉Model Quantization

Energy‑Efficient Sparse Coding in Neuromorphic Hardware for Autonomous Drones:

freederia.com·3d

⚡Flash Attention

Three AI engines walk into a bar in single file...

theregister.com·15h

🤖AI Coding Tools

How I squeezed a BERT sentiment analyzer into 1GB RAM on a $5 VPS

mohammedeabdelaziz.github.io·1d·

Discuss: Hacker News

🏎️TensorRT

Understanding LLM Inference Engines: Inside Nano-vLLM (Part 2)

neutree.ai·2d·

Discuss: Hacker News

**Pulse‑Sequence Tuning for Fault‑Tolerant Exponentiation in Shor’s Algorithm on Transmon Qubits**

dev.to·1d·

Discuss: DEV

⏱️Benchmarking

NVIDIA VibeTensor: AI Just Built Its Own Deep Learning Engine… And It Actually Works (AI Revolution

youtube.com·20h

🏎️TensorRT

Loading more...